Reluctant Paraphrase: 
Textual Restructuring under an Optimisation Model 
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Abstract 

This paper develops a computational model of paraphrase under which text mod- 
ification is carried out reluctantly; that is, there are external constraints, such as 
length or readability, on an otherwise ideal text, and modifications to the text are 
necessary to ensure conformance to these constraints. This problem is analogous to 
a mathematical optimisation problem: the textual constraints can be described as a 
set of constraint equations, and the requirement for minimal change to the text can 
be expressed as a function to be minimised; so techniques from this domain can be 
used to solve the problem. 

The work is done as part of a computational paraphrase system using the XTAG 
system H as a base. The paper will present a theoretical computational framework for 
working within the Reluctant Paraphrase paradigm: three types of textual constraints 
are specified, effects of paraphrase on text are described, and a model incorporating 
mathematical optimisation techniques is outlined. 



1 Framework 

The work this paper describes is done as part of a 
computational paraphrase system using the XTAG 
system || as a base. Although the goal of the system 
is to modify text to achieve some objective, it is fun- 
damentally unlike existing systems which paraphrase 
text, such as style checkers [jl3|, in that the context 
of paraphrasing is different; this context, Reluctant 
Paraphrasing, is described below, with a theoretical 
framework for the paraphrasing presented in the rest 
of the paper. 

Reluctant Paraphrase (RP) can best be defined by 
contrasting it with the remedial sort of paraphrases 
suggested by style checkers, or in style guides such 
as Strunk and White [jl7|, and so on. The starting 
point under this remedial style of paraphrase is an 
imperfect text which has to be corrected, the correc- 
tions being determined by some prescriptive advice 
such as "make the text more active" . The text is run 
through a style checker, or past an editor, and flaws of 
vocabulary or grammar or style are corrected. In con- 
trast, imagine the completion of an ideal document: 
it says exactly what the author intends, and no more; 
every word captures all the nuances the author wants 



to convey. However, it has to be changed because of 
external constraints. These constraints might be the 
need to cut down an academic paper by one page for 
conference publication; or the need to make a techni- 
cal document conform to house style readability re- 
quirements; or some combination of these or other 
sorts of external constraints. Thus, the text has to 
be paraphrased, albeit reluctantly, in order to meet 
these externally imposed constraints. 

Dealing with this reluctant sort of paraphrase, 
rather than the remedial sort, has a number of ad- 
vantages. Firstly, it avoids representational problems 
that are otherwise inherent in paraphrasing. In reme- 
dial paraphrasing, paraphrase requirements can be of 
arbitrary complexity, ranging from "change sentence 
voice" to "fix incoherent theme" . This arbitrariness 
of complexity makes developing a consistent repre- 
sentation near impossible. However, under RP the 
paraphrases don't embody the correction in the same 
way that remedial paraphrases do; instead, they are 
just tools which are used to alter the text so that it 
conforms to the imposed constraints. Given that the 
paraphrases are just tools, it is possible to pick a lim- 
ited set of them and still attempt to cover all of them 
with a consistent representation. 



Secondly, it avoids the debate about making text 
'better'. There are longstanding arguments in the lit- 
erature about particular techniques and their efficacy 
in improving text: examples are the passive to ac- 
tive voice paraphrase, relative pronoun deletion and 
the avoidance of nominalisation. In RP, by contrast, 
taking the standpoint that the original text is ideal 
means that any change will be undesirable, so only 
the minimal level of change to the text in keeping 
with the constraint satisfaction should be made. 

The computational paraphrase system within RP 
that this paper discusses thus has three components: 
a set of paraphrase techniques which is used to 
achieve the text modification; a set of constraints 
to which the text must adhere after the modifica- 
tion; and an effect — that of the change to the text 
caused by the paraphrases applied — which is to be 
minimised. This parallels closely a mathematical op- 
timisation model, with, respectively, a set of decision 
variables, a set of constraint equations and an opti- 
misation function. The rest of this paper presents a 
formulation of RP which draws on ideas from the field 
of mathematical optimisation: Section ^ discusses nu- 
meric constraints on text; Section |^ looks at quanti- 
fying text effects of paraphrases; and Section [| de- 
scribes the actual model. 

2 Textual Constraints 

This section describes three measures of text, those 
of length, readability and lexical density. These 
measures are often used in the production of text; 
their numeric quality is what makes them particularly 
amenable to the optimisation model of this paper. 

Length is the simplest measure, and is frequently 
used in practice as a constraint. For example, re- 
stricting the length of a text is standard for academic 
conferences — like this conference with its 3000 word 
limit on abstracts — and meeting this constraint often 
involves cutting down a longer draft version. It is also 
typical in other areas such as the editing of newspaper 
text ||. Constraining text length is also a feature of 
computational language generation systems, either as 
a general directive implementing the Gricean maxim 
of conciseness, as in the Epicure system M, or as an 
explicit limit on the length of an individual text unit, 
as in the Streak system [jl6| . 

Another common measure comes from readability 
formulae, such as the Flesch Reading Ease Score or 
the Dale-Chall formula Standard readability for- 
mulae are basically equations which attempt to pre- 
dict, rather than evaluate, the readability of text; in 
form they are generally linear combinations of factors 
which correlate with text complexity. These factors 
are of fairly simple types: a measure of sentence com- 



plexity, usually average sentence length; and a mea- 
sure of word complexity, such as average word length 
in syllables, or proportion of infrequent words. The 
weightings for these terms are assigned by calculating 
a correlation with tests of readers' comprehension. 

The most accurate way of determining readabil- 
ity would be by testing readers' comprehension di- 
rectly. However, this would be expensive in terms of 
time and other resources; readability formulae were 
constructed as an attempt to predict the readability 
that would be measured by these tests. This, together 
with the numerical phrasing of the readability, is the 
reason for using readability formulae here. Moreover, 
the faults of readability formulae — documented in, for 
example, |7j — are not significant in the context of RP, 
for a number of reasons. 

Firstly, use of readability formulae can be de- 
fended on practical grounds: readability formulae are 
used as criteria for writing public documents in the 
US, such as insurance policies, tax forms, contracts 
and jury instructions q|, for producing military doc- 
uments |T^], and so on. In these situations the use 
of readability formulae is mandatory; so for a system 
which models realistic constraints on text, using the 
formulae as a constraint is reasonable. 

Secondly, most objections are based on the use of 
readability formulae in the strong sense — when actual 
readability levels are predicted — rather than when 
used in their weak sense — when readability formulae 
are used to rank texts relative to each other in or- 
der of reading complexity ; and under Reluctant 
Paraphrase, this is not a problem, as the texts, one 
of which is a paraphrase of the other, are just ranked 
relative to each other. 

Lexical density is a textual measure discussed by 
Halliday (Tc[ |; it attempts to capture the 'condensed- 
ness' of text by measuring the proportion of non- 
content (or function) words to total text. Halliday 
uses this idea of condensedness to distinguish between 
written and spoken forms of language: written lan- 
guage tends to be more condensed than spoken, with 
constructions of type (|l|a) more prevalent in writing 
and those of type (|l|b) more prevalent in speech. 

(1) a. Sex determination varies in different or- 
ganisms. 

b. The way sex is determined varies in differ- 
ent organisms. 

The concept is also useful in the context of this 
paper's optimisation model, as a constraint counter- 
balancing the readability one. Under a typical read- 
ability formula, the readability value is generally cor- 
related with average sentence length, so the formula 
value can be improved by the sort of paraphrases 
which compress text, such as the mapping of (|l|b) to 



. Compression to too great an extent can lead to 
text that is difficult to understand; the use of lexical 
density as a constraint can act as a counterweight to 
the readability constraint, to prevent excessive text 
compression. 

3 Paraphrases 

As noted in Section [I], paraphrases can be of arbi- 
trary complexity. In keeping with their use in RP 
as broad-coverage tools, the most appropriate para- 
phrases, and hence the ones that are used in this 
work, are ones that are syntactic in nature. An ex- 
ample of this type, modelled on work by Jordan pi, 
is the splitting off of a noun post-modifier to form a 
separate sentence: 

(2) a. Sarah warily eyed the page filled with top- 
icalisations and other linguistic phenom- 
ena. 

b. Sarah warily eyed the page. It was filled 
with topicalisations and other linguistic 
phenomena. 

The paraphrases used here are taken from three 
different types of sources: popular (style guides such 
as |Tt| ) ; academic (work on textual analysis involv- 
ing paraphrasing, such as fllTf and [^6| ); and practical 
(the actual practices of people involved in paraphras- 
ing text, such as editors and journalists Q). 

These paraphrases will cause some change to the 
text, and, under RP, any change effected by a para- 
phrase is taken to be a negative one. Developing an 
optimisation model thus requires a quantification of 
the effects that imposing a paraphrase on a text will 
have on that text. The rest of this section sketches 
methods for assigning a quantification to a para- 
phrase, which will lead to a minimisation function for 
the model. There are two types of effects analysed in 
this work, effects on meaning and effects on discourse 
structure. These two types are then combined to give 
the minimisation function. 

3.1 Meaning Effects 

One way in which a paraphrase can affect a text is 
in terms of its truth-conditional meaning; or, in Hal- 
lidayan terms, its ideational metafunction. A unit of 
text, such as a sentence, can be viewed as a statement 
about the world, which is either true or falsef]; an al- 
ternative, but related, view is that the truth of the 
statement is represented by a set of possible worlds 
in which the statement is trueQ. A paraphrase is con- 
sequently defined more precisely as consisting of two 

i Only declarative sentences are dealt with in this paper. 
2 This is a much simplified summary of work on truth- 
conditional meaning presented in, for example, |ll . 



sentences where the set of possible worlds in which 
one sentence is true is a (not necessarily proper) sub- 
set of the possible worlds in which the other is true. 
Take the following examples: 

(3) a. Onlookers scrambled to avoid the car 

which was flashing its headlights. 

b. Onlookers scrambled to avoid the car flash- 
ing its headlights. 

(4) a. The salesman made an attempt to wear 

Steven down. 

b. The salesman attempted to wear Steven 
down. 

(5) a. There was a girl standing in the corner, 
b. There was a girl in the corner. 

(6) a. Tempeste approached Blade, a midnight 

dark and powerful figure, and gave him a 
resounding slap. 

b. Tempeste approached Blade and gave him 
a resounding slap. 

These examples give a range of different magni- 
tudes in the size of the sets representing the possible 
worlds in which each of the paraphrase alternatives 
is true. Example (|^) represents a fairly minimal dif- 
ference: (|3|b) can be a paraphrase either of (|3^,) or of 
Onlookers scrambled to avoid the car which is flash- 
ing its headlights. The possible worlds in which (|^b) 
is true is a proper superset of the possible worlds in 
which (||a) is true; but intuition suggests the sets are 
relatively close in size, (Qb) only covering two different 
cases with respect to the altered constituents. Exam- 
ple (ji|) represents a slightly bigger paraphrase: (^b) 
can paraphrase statements asserting one attempt — 
equivalent to (^a) — two attempts, seven attempts, or 
many attempts. The size of the set difference here is 
consequently relatively larger than in (||) . In (g) , the 
difference is larger still, in that ^p) can describe sit- 
uations where the girl is sitting, lying, dancing, and 
so on. The largest difference is in (f|, where (|b) in- 
cludes in its set of possible worlds, over and above the 
possible worlds in which (^|a) is true, worlds in which 
Blade is described by any other appositive. 

A way of approximating the intuition about the 
difference in the relative sizes of possible world sets is 
by using parts of speech. An alteration in less signif- 
icant parts of speech corresponds to a small relative 
difference in set size, and so on. So in (0) , the changed 
parts of speech are a relative pronoun, which causes 
no difference in truth-conditional meaning, and the 
auxiliary verb be, which leads to the relatively small 



difference. In comparison, the deletion of the open- 
class constituent in (|J), the present participle stand- 
ing, leads to a much greater set difference; and delet- 
ing multiple open-class words in (0) has a still larger 
effect. 

A possible refinement of this approximation in- 
volves considering lexical factors. For example, the 
paraphrase in m) is less significant than if (|5|a) had 
been the girl coruscating in the corner, the latter op- 
tion is much more unexpected, and so it can be argued 
that its removal alters the text to a much greater ex- 
tent. As they are related to frequency, these lexical 
factors could be estimated through collocational anal- 
ysis within a corpus, although this has not been done 
as yet. 

3.2 Discourse Effects 

As well as affecting the truth-conditional meaning of 
the text, a paraphrase can alter the discourse fea- 
tures of the text; or, in Hallidayan terms again, the 
textual metafunction. Because of the assumption 
behind RP that the author has deliberately chosen 
a particular way of packaging the information in a 
sentence, any paraphrase which alters the packaging 
structure is altering the author's intention and hence 
should be included in the measurement of change and 
the consequent minimisation function. Work in the 
area of information packaging includes [0] and 
fill ; although approaches differ, all have some con- 
cept of syntactic structures reflecting packaging of 
information — which part is known to the reader, and 
which is new. An example is an it-cleft sentence and 
its standard declarative paraphrase: 

(7) a. It was the balcony and its scholarly dis- 
course which irresistibly drew Ryan. 

b. The balcony and its scholarly discourse ir- 
resistibly drew Ryan. 

In (0a), the fact that Ryan has been irresistibly 
drawn is indicated as a given or topic, and the 
balcony-as-drawer as the new piece of information. 
In (^b) there is no such marking. 

A rough numerical measure of this can be gained 
by counting the difference in the questions to which 
the sentence can be an answer. So (0a) can only be 
an answer to the narrow-focus What irresistibly drew 
Ryan?, while (0b) can answer not only this question 
but also What did the balcony and its scholarly dis- 
course do to Ryan?, What did the balcony and its 
scholarly discourse do?, or the wide- focus What hap- 
pened?. 



4 An Optimisation Approach 

The optimisation model for the computational para- 
phrase system requires a formal specification of the 
paraphrases and their attributes — their effect on the 
text in terms of the parameters, such as number of 
words or sentences, affected by each constraint; and 
their effect on the text's meaning and information 
structure. The paraphrases are formally specified us- 
ing the representation formalism as proposed in j(|; 
however, an informal description of the paraphrase is 
adequate for discussion of the paraphrase effects and 
their inclusion into the optimisation model. 

This section presents a mathematical optimisa- 
tion model of paraphrasing. The basic techniques 
are those of integer programming (see, for example, 
fll9l), which describes the constraints and function to 
be minimised in terms of linear combinations of in- 
teger variables. The integer programming approach 
is useful because it provides a set of techniques for 
guaranteeing an optimal solution, heuristics for cut- 
ting the search space, and methods for model anal- 
ysis^. After a formal presentation of the model, an 
example is given for clarification. 

4.1 The Model 

In developing an optimisation model, it is first nec- 
essary to identify the decision VARIABLES: that is, 
those factors about which a decision is to be made. 
In this case, it is the paraphrase mappings: for each 
paraphrase, the decision is whether this paraphrase 
should be applied to the text to move it towards sat- 
isfying the constraints while minimally perturbing the 
text. In this situation, the choice is binary, whether 
or not to apply the paraphrase. Given this, the deci- 
sion variables are 

Pij = a 0/1 valued variable representing 
the jth potential paraphrase for sentence 

i 

The objective function, the function to be op- 
timised, is, for RP, a measure of the change to the 
text, as described in Section 0. With Cy being the 
effect (or cost) of each paraphrase, if applied, this 
function has the form 

Z = Cij .pij 

The constraints take the form "total length must 
be decreased by at least some constant value", or 
"readability value must be no greater than some con- 
stant value". Expressed mathematically, the length 
constraint is 



This last feature is not discussed in this paper. 



.p^ < h 

where 

wtj = change to length of sentence i 
caused by paraphrase ij 
ki = required change to the length of text 
in words; k± < 

A simplified readability constraint^, using only 
the average sentence length component, is 

S + ^2 Sij .pij 

that is, 

y^(w»j - k 2 .Sij)p lj < k 2 S - W 

where 

Sjj = change to number of sentences in 

the text by paraphrase ij 

W = total number of words in original 

text 

S = total number of sentences in original 
text 

k 2 = required average sentence lengthF]; 
k 2 > 

The lexical density constraint requires the pro- 
portion of function words, taken here to be all closed 
class words, to total words to be greater than some 
constant value. It has the form 



that is, 

/Xfij - k 3 .Wij)pij > k 3 W - F 

where 

4 This simplification means that non-linear, quadratic pro- 
gramming techniques do not have to be introduced at this 
stage. 

5 While the choice of a particular ki is straightforward, 
choosing a reasonable value for ki requires more effort: for 
example, analysing average sentence length in a corpus which 
satisfies typical readability targets (such as "senior high school 
level" in the Flesch Reading Ease score). The constant k^ can 
be ascertained similarly. 



fij = change to number of function words 
caused by paraphrase ij 
F = total number of function words in 
original text 

&3 = required proportion of function 
words to total words; < k$ < 1 

Given that there are j paraphrases for each sen- 
tence (with j varying for each sentence), there is a 
potential conflict for the paraphrases. To simplify the 
application of the paraphrases, an extra constraint is 
added, stating that there can be at most one para- 
phrase for each sentence: 

3 

Although it is possible in particular cases for 
paraphrases to overlap and produce satisfactory text, 
there is no easy way in advance to decide this; so for 
an automated system the above constraint is neces- 
sary, at least until a much more detailed analysis of 
paraphrase interaction has been carried out. 

An example is presented in the next section, to 
illustrate the model. The small size of this example 
does not allow a real demonstration of the usefulness 
of the approach, since the problem can be solved al- 
most by inspection. However, in larger problems this 
method of modelling allows the use of techniques such 
as branch-and-bound |l9) which make the solution of 
the problem feasible, where the solution would oth- 
erwise be impractical because of the problem's expo- 
nential complexity. 

4.2 An Example 

As an example, take the short text: 

(8) a. The cat sat on the mat which was by the 

door. 

b. It ate the cream ladled out by its owner. 

c. The owner, an eminent engineer, had a 
convertible used in a bank robbery. 

The values of F, W and S are 17, 33 and 3 re- 
spectively. 

Possible paraphrases of individual sentences, us- 
ing just relative pronoun deletion, post-modifier split, 
and parenthetical deletion, are: 

(9) pn. The cat sat on the mat by the door. 

P2i- It ate the cream. It had been ladled out 
by its owner. 

P3±. The owner, an eminent engineer, had a 
convertible. It had been used in a bank 
robbery. 

P3 2 . The owner had a convertible used in a 
bank robbery. 





number of words 


avg sent, length 


original text 


1791 


24.88 


num. words minimised 


1531 


23.92 


avg sent, minimised 


1784 


17.66 



Table 2: Maximal text flexibility 



paraphrase ij 


fij 


Wij 


Sij 


11 


-2 


-2 





21 


+3 


+3 


+1 


31 


+3 


+3 


+1 


32 


-1 


-3 






Table 1: Variable coefficients 



This gives decision variables p%\, P21, P3%, and P32, 
with associated coefficients in Table 1. 

For the example, the constraint values are (arbi- 
trarily) chosen as k\ = (at worst no compression 
of text length), &2 = 10 (average sentence length no 
greater than 10), and k-$ — 0.525 (function words no 
less than 52.5% of the text). 

Through the process of integer programming, 
there are two alternatives which are feasible solutions: 

Pu = P21 = P31 = 0, P32 = 1 

P31 = 0, pu = P21 = P32 = 1 

This gives two values for the objective function, 
z = c 32 and z = cu + Q21 + C32- Since V(ij)cy > 
— under the Reluctant Paraphrase assumption all 
changes involve a positive cost — the best alternative 
is the first, with only the second paraphrase for sen- 
tence number three being applied. The resulting text 
is then: 

(10) a. The cat sat on the mat which was by the 
door. 

b. It ate the cream ladled out by its owner. 

c. The owner had a convertible used in a 
bank robbery. 

4.3 Actual Text 

Current work involves applying this technique to ac- 
tual text, taken from the periodical The Atlantic 
Monthly. This source was chosen as it has reasonably 
complex text on which a large range of paraphrases 
can be applied. The text consists of 72 sentences and 
totals 1791 words; there are 84 possible paraphrases, 
over 45 of the sentences. 



In order to determine possible constraint values 
for real text, it is first necessary to evaluate the flex- 
ibility of the text: to what extent can the length be 
altered, say, or the readability changed? Choosing 
sets of paraphrases which maximise the relevant con- 
straint, regardless of the value of the cost function or 
the effect on other constraints, the results given in 
Table 2 were obtained. 

So at best it is possible, for this text, to reduce 
word length by about 15%, and the average sentence 
length by about 30%. This information is then used 
to set reasonable constraint limits. 

One way in which the task of applying the model 
to actual text is more complicated than the example 
is in the need to set numeric values for the objec- 
tive function coefficients. In the example, because of 
the small number of objective function coefficients, 
it is generally possible to just compare the result 
of the function algebraically. Taking as a first at- 
tempt at a numeric objection function the assign- 
ment of constant differences between the classes of 
textual change described in Section |^, the approach 
was applied to the first 19 sentences of the Atlantic 
Monthly text. Modelling the problem as a optimi- 
sation one, combined with branch-and-bound tech- 
niques, reduced the search space by 41.5% from 2 19 
possible solutions to 306828 candidates. 

5 Conclusion 

The paper has drawn on diverse areas of linguistics 
and mathematics to present a nonetheless fairly nat- 
ural view of paraphrase as a mathematical optimi- 
sation problem. This phrasing of paraphrase as an 
optimisation problem has three main components. 
Firstly, three appropriate constraints have been cho- 
sen and modelled as constraint equations. Secondly, 
a method for quantifying the effects of paraphrase on 
text, and their expression as an optimisation objec- 
tive function, has been discussed. Thirdly, the model 
has been described with an application to a small 
example text given. Application to actual text has 
shown the extent to which the technique can be ap- 
plied: for example, the length constraint is not meant 
to mimic summarisation, but rather to enable the 
massaging of a text that is not too far from what 



is required. 

Current work involves a deeper application of the 
model to actual text: a larger number of constraints, 
more paraphrases, and an objective function which 
can be numerically evaluated. This then enables an 
analysis of text using the sensitivity analysis which 
is a corollary of linear programming, answering ques- 
tions such as: 

• What are the characteristics of elastic text, 
that is, one which responds a lot to small 
changes? 

• What is the sensitivity of text to changes in 
model assumptions, and would the same para- 
phrases be chosen given these changes? 

• What are the equivalence classes for the para- 
phrases used, that is, which paraphrases are in 
effect interchangeable? 
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